Qy 169 ASKI FNCRMEWEKVERGRRTSLCTHDPAKI CSRDH AQSSATWSCSQP 215 

Ml I I : : lh II II I I hi =11=11 

Db 1358 CKKIFAC KYKECNKRFLCSKALAKHCSDSHNLDHIEEPKVTiSEAGSAARFSCNQP 1412 

Qy 216 FKWCVY I AFYSTDYRLVQ KVCPDYNYHSD 245 

: |: | : |:: :: || | | 

Db 1413 - QCPAVF YTFNKLKHHLMEQHNI EGE I HSDYE I HCD 1447 
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Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
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ALIGNMENTS 



RESULT 1 
S35219 

homeotic protein Hox C4 - mouse 

N;Alternate names: homeotic protein Hox 3.5; homeotic protein MAB87 
C; Species: Mus musculus (house mouse) 

C;Date: 10-Dec-1993 #sequence_revision 03-Aug-1995 #text_change 22-Jun-1999 
C;Accession: S35219; A49153; C41606; 149752 

R/Goto, J.; Miyabayashi, T. ; Wakamatsu, Y. ; Takahashi, N. ; Muramatsu, M. 
Mol. Gen. Genet. 239, 41-48, 1993 



A;Title: Organization and expression of mouse Hox3 cluster genes. 

A;Reference number: S35219; MUID: 93288004 ; PMID: 8099712 

A; Access ion: S3 52 19 

A; Status : preliminary 

A; Molecule type: DNA 

A; Residues: 1-264 <GOT> 

A; Cross-references: GB:S62287; NID:g385749; PIDN : AAB27153 . 1 ; PID : g3 8 5750 ; 
GB:D11328; NID:g406212; PID:g416420 

A;Note: entry MUSHOX35A in GenBank release 103 duplicates GenBank entry S62287 
except for an incorrect reference citation 

R;Geada, A.M. ; Gaunt, S.J.; Azzawi, M . ; Shimeld, S.M.; Pearce, J.; Sharpe, P.T. 
Development 116, 497-506, 1992 

A;Title: Sequence and embryonic expression of the murine Hox-3.5 gene. 
A;Reference number: A49153; MUID: 93161956; PMID: 1363091 
A /Accession: A4 9153 
A; Status : preliminary 
A; Molecule type: mRNA 

A;Residues: 1-79, »G' ,81-95, 'S» ,97-264 <GEA> 

A; Cross-references : GB:X69019; NID:g396183; PIDN: CAA4 8784 . 1 ; PID:g396184 

A;Note: sequence extracted from NCBI backbone (NCBIN: 124829, NCBIP : 124830) 

R;Murtha, M.T.; Leckman, J.F.; Ruddle, F.H. 

Proc. Natl. Acad. Sci. U.S.A. 88, 10711-10715, 1991 

A;Title: Detection of homeobox genes in development and evolution. 

A/Reference number: A41606; MUID: 92073357; PMID: 1720547 

A; Access ion: C41606 

A; Status : preliminary 

A; Molecule type: DNA 

A;ResidueS: 177-201 <MUR> 

A; Cross-references : GB:M81660; NID:gl93975; PIDN :AAA63 313 . 1 ; PID:gl93976 
C;Genetics : 

A; Gene: Hoxc-4; Hox 3.5 
A ; Map position: 15 
A;Introns: 147/1 
C; Function: 

A; Description: control of embryonic development by tissue- and stage-specific 
regulation of transcription 

C;Superfamily: homeotic protein Hox D4; homeobox homology 

C;Keywords: DNA binding; embryo; homeobox; nucleus; transcription regulation 
F; 157 -2 13 /Domain : homeobox homology <H0X> 

Query Match 7.0%; Score 97.5; DB 1; Length 264; 

Best Local Similarity 22.6%; Pred. No. 0.55; 

Matches 48; Conservative 25; Mismatches 90; Indels 49; Gaps 9; 

Qy 27 PPGSEDPERD DHEGQPRPRVPRKRGHI S P - KSRPMAN - STLLGLLAPPGEAWG I 78 

II III II h II I Ihh : I I I I I 

Db 57 PPRPSYPERQYSCTSLQGPGNSRAHGPAQAGHHHPEKSQPLCEPAPLSGTSASPSPAPPA 116 

Qy 79 LGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTGKI VDHGNGTFS 131 

II hi -I =: I I = I I 

Db 117 CSQP- -APDHPSSAASKQPIVYPW MKKIHVSTVNPNYNGGEPKRSRTAYTRQQVLEL 171 

Qy 132 VHFQHNATGQGNI SI SLVPPSKAVEFHQEQQI FI EAKASKI F - -NCRMEWEKVERGR 186 

h I : I h : : : ||: | ||:|:| | 

Db 172 EKEFHYNRYLTRRRRIEIA HSLCLSERQIKIWFQNRRMKWKKDHRLP 218 

Qy 187 RTSLCTHDPAKI C SRDHAQSS 207 



Db 219 NTKVRSAPPAGAAPSTLSAATPGTSEDHSQSA 250 



RESULT 2 
CGHU1V 

collagen alpha 1 (V) chain precursor - human 
N;Alternate names: procollagen alpha 1 (V) chain 
C; Species: Homo sapiens (man) 

C;Date: 22-Nov-1993 #sequence_revision 03-Oct-1995 #text_change 16-Jun-2000 
C;Accession: S18802; S16024; A61142; S11303; S03978; S43642; S58665 
R/Greenspan, D.S.; Cheng, W.; Hoffman, G.G. 
J. Biol. Chem. 266, 24727-24733, 1991 

A;Title: The pro-alphal (V) collagen chain. Complete primary structure, 
distribution of expression, and comparison with the pro-alphal (XI ) collagen 
chain. 

A/Reference number: S18802; MUID: 92105142 ; PMID:1722213 
A; Accession: S18802 
A /Molecule type: mRNA 
A/Residues : 1-1838 <GRE> 

A; Cross-references: GB:M76729; NID:gl89519; PIDN: AAA59993 . 1 ; PID:gl89520 
R;Takahara, K. ; Sato, Y.; Okazawa, K. ; Okamoto, N. ; Noda, A.; Yaoi, Y. ; Kato, I. 
J. Biol. Chem. 266, 13124-13129, 1991 

A,-Title: Complete primary structure of human collagen alpha-1 (V) chain. 
A;Reference number: S16024; MUID: 91302336; PMID: 2071595 
A; Accession: S16024 
A ,-Molecule type: mRNA 

A; Residues: 1-81, 'QL' , 84-389, 'A' ,391-676, ' K' , 678-1294 , ' PS 1 , 1297, • RS ' ,1300- 
1553, 'R' , 1555-1812, *V f , 1814-1838 <TAK> 

A; Cross-references: GB:D90279; NID:g219509; PIDN : BAA14323 . 1 ; PID:g219510 
A;Note: parts of this sequence were determined by protein sequencing 
R;Yaoi, Y. ; Hashimoto, K. ; Takahara, K. ,- Kato, I. 
Exp. Cell Res. 194, 180-185, 1991 

A;Title: Insulin binds to type V collagen with retention of mitogenic activity. 

A;Reference number: A61142; MUID: 91224163 ; PMID:1709100 

A /Access ion: A61142 

A/Molecule type: protein 

A/Residues: 823-824 , 'X' , 826-842 <YAO> 

A/Note: the residue designated 'X' is probably glycosylated hydroxylysine,- this 
cyanogen bromide fragment contains an uncharacterized growth hormone -binding 
region 

R;Yaoi, Y. ; Hashimoto, K. / Koitabashi, H. ; Takahara, K. ; Ito, M. ; Kato, I. 
Biochim. Biophys . Acta 1035, 139-145, 1990 

A/ Title: Primary structure of the heparin-binding site of type V collagen. 

A/Reference number: S11303/ MUID : 90366601 / PMID:2203476 

A/ Access ion: SI 13 03 

A/ Molecule type: protein 

A; Residues: 823-824, 'X» ,826-848, * I \ 850-851 , ' P ' , 853 , 1 PR ' ,856-893, 'D' ,895- 
932, f X» , 934-950 <YA2> 

A/Note: the residues designated 'X 1 are probably glycosylated hydroxylysine; 
this sequence revised by S16024 
R/Seyer, J.M.; Kang, A . H . 

Arch. Biochem. Biophys. 271, 120-129, 1989 

A;Title: Covalent structure of collagen: amino acid sequence of three cyanogen 
bromide-derived peptides from human alpha-1 (V) collagen chain. 
A;Reference number: S03978; MUID: 89227189/ PMID:2496661 
A/ Accession: S03978 



A;Molecule type: protein 

A; Residues: 621-640, 'G ' , 642-649 , 1 L 1 ,651-662, ' E 1 ,664-667, 'Q' ,669-676, 'Q 1 , 678- 
683, 'P' ,685-691, 1 VT ' , 694 , ' E ' ,696-697, 'AP' ,700-726, 'Q' ,728-74 0, »L' ,742- 
746, 'Q' ,748-752, 'A' ,754-758, 1 N ' ,760-775, 'QK 1 ,778-822 <SEY> 

A; Note: there are a number of inconsistencies between the sequences in figures 6 
and 7; the sequence in figure 7 is given 

R;Moradi-Ameli, M . ; Rousseau, J.C.; Kleman, J. P.; Champliaud, M.F.; Boutillon, 
M.M.; Bernillon, J.; Wallach, J.; van der Rest, M. 
Eur. J. Biochem. 221, 987-995, 1994 

A; Title: Diversity in the processing events at the N-terminus of type-V 
collagen. 

A/Reference number: S43642; MUID : 94237164 ; PMID: 8181482 
A/Accession : S43642 
A;Molecule type: protein 

A; Residues: 565-576/756-758, 'X« ,760-763, 'X ' , 765-772 ; 1012-1029 ; 1219-1232 ; 1465- 
1474, 'X' , 1476-1477 <M0R> 

R;Fessler, L. I . ; Brosh, S.; Chapin, S . ; Fessler, J.H. 
J. Biol. Chem. 261, 5034-5040, 1986 

A;Title: Tyrosine sulfation in precursors of collagen V. 
A;Reference number: A56977; MUID : 86168226 ; PMID:3082875 

A; Contents: annotation; identification of tyrosine sulfate in the amino-terminal 
propeptide 

R;Lee, S.; Greenspan, D.S. 
Biochem. J. 310, 15-22, 1995 

A;Title: Transcriptional promoter of the human alpha-l(V) collagen gene 
(COL5A1) . 

A/Reference number: S58665; MUID: 95374437 ; PMID:7646438 
A;Accession: S58665 

A; Status: preliminary; not compared with conceptual translation 
A; Molecule type: DNA 
A; Residues: 1-36 <LEE> 

A; Cross-references : GB:L38808; NID:gl020325; PIDN : AAA79853 . 1 ; PID:gl020326 
C; Comment: Prolines and lysines at the third position of the tripeptide 
repeating unit (G-X-Y) are hydroxylated to varying extents. Prolines are 
predominately 4 -hydroxylated. About 50% of the lysines are 5 -hydroxylated and 
subsequently O-glycosylated . 

C; Comment: A long form of the mature protein containing part of the amino- 
terminal propeptide has been detected but not characterized. The homotrimer is 
probably fully processed to the short form, while the heterotrimers are probably 
processed to the long form. 
C,-Genetics : 
A; Gene: GDB:C0L5A1 

A; Cross -references : GDB: 131457; OMIM: 120215 
A ; Map position: 9q34 . 2-9q34 . 3 

C; Complex: type V collagen may be a homotrimer of alpha 1 (V) chains, a 
heterotrimer of two alpha 1 (V) chains and one alpha 2 (V) chain (see PIR:CGHU2V) , 
or a heterotrimer of one alpha 1 (V) chain, one alpha 2 (V) chain and one alpha 
3(V) chain, initially linked by disulfide bonds among their carboxyl -terminal 
propeptides; a polymer of collagen trimers, offset by approximately one-quarter 
of their length, is formed with desmosine cross-links made from lysine and 
al lysine residues 
C; Function: 

A; Description: structural component of extracellular fibrous polymer associated 
with cell surfaces and interstitial fibrils; widely distributed but least 
abundant of the fibrillar collagens 

A;Note: may play a role in controlling the lateral growth of collagen I fibrils 



C; Super family: collagen alpha 1 (V) chain; fibrillar collagen carboxyl -terminal 
homology 

C; Keywords: coiled coil; extracellular matrix; glycoprotein; hydroxylysine; 

hydroxyproline; pyroglutamic acid; sulf oprotein; trimer; triple helix 

F; 1-37/Domain : signal sequence #status predicted <SIG> 

F;36-261/Domain: PARP-like #status predicted <PARP> 

F;38-541/Domain: amino -terminal propeptide #status predicted <PRO> 

F; 542-1605/Product : collagen alpha 1 (V) chain, short form #status predicted 

<MAT> 

F; 542 -558 /Region : amino-terminal nonhelical telopeptide 
F; 559 -1572 /Region: helical 

F; 645 -64 7 /Region: cell attachment (R-G-D) motif 
F; 663 -665 /Region: cell attachment (R-G-D) motif 
F; 8 97 -92 9 /Region : heparin binding 

F; 1573 -16 05 /Region :' carboxyl -terminal nonhelical telopeptide 

F; 1606-1838/ Doma in : carboxyl -terminal propeptide #status predicted <CPR> 

F; 1615- 183 7 /Domain: fibrillar collagen carboxyl -terminal homology <FCC> 

F; 38/Modif ied site: pyrrolidone carboxylic acid (Gin) (in mature form) #status 

predicted 

F; 62-244 , 183 -237/Disul fide bonds: #status predicted 

F;159, 176,385, 1672, 1741/Binding site: carbohydrate (Asn) (covalent) #status 
predicted 

F; 234 ,236 ,240 ,262, 263 ,273 ,274 ,275, 277 ,279 ,280 ,338 ,34 0,346 ,347, 352,357,416,417,42 
0 , 421/Binding site: sulfate (Tyr) (covalent) #status predicted 
F; 535/Modif ied site: allysine (Lys) #status predicted 

F; 54 1-542 /Cleavage site: Ala-Gin (procollagen N-endopeptidase) #status predicted 
F; 542/Modif ied site: pyrrolidone carboxylic acid (Gin) (in mature form) #status 
predicted 

F; 570, 576, 621, 639, 648, 654,657,675,678,690,693,696,705,717,720,726,732,741,750,75 
3,756,762,765,771,780,789,816,834,870,873,876,888,891,903,906,930,945,1017,1020, 
1023, 1029, 1221, 1224, 1467, 1470/Modified site: 4 -hydroxyproline (Pro) #status 
experimental 

F; 627, 642 ,687, 708, 744, 774, 795, 8 04, 807, 810, 8 19, 825, 846, 864, 882, 897/Modif ied site: 
5 -hydroxylysine (Lys) #status experimental 

F; 627, 642, 687, 774 ,795, 804, 807, 8 10, 819, 825 ,846, 864 ,882 ,8 97 , 1482/Binding site: 
carbohydrate (Lys) (covalent) #status predicted 

F ; 708 , 744 /Binding site: carbohydrate (Lys) (covalent) #status experimental 

F; 1482/Modif ied site: 5 -hydroxylysine (Lys) #status predicted 

F; 1605 -16 06 /Cleavage site: Ala-Asp (procollagen C-endopeptidase) #status 

predicted 

F; 1639, 1645, 1662, 1671/Disulfide bonds: interchain #status predicted 
F; 1680-1835, 1746-1789/Disulfide bonds: #status predicted 

Query Match 6.9%; Score 96; DB 1; Length 1838; 

Best Local Similarity 21.6%; Pred. No. 7.8; 

Matches 59; Conservative 23; Mismatches 71; Indels 120; Gaps 13; 

Qy 25 DGPPGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGE 74 

I I I I I I I =1 I II = hlh I I I I 

Db 1458 DGPPGPMGP -PGLPGLKGDSGPKGE-KGHPGLIGLIGPPGEQGEKGDRGLP 1506 

Qy 75 AWGILGQ PPNRPNHSPPPSAKVKKIFGW 102 

III III II I I 

Db 1507 GPQGSSGPKGEQGITGPSGPIGPPGPPGLPGPPGPKGAKGSSGPTGPKGEAGHPGPPGPP 1566 

Qy 103 GDFYSNI KTVALNL LVTGKI VDHGNGTFSVHFQHNATGQGNI S I SLVPPSKAVE 156 

h : I : :::| III I = II 



Db 1567 GPPGEVI QPLPI QASRTRRNI DASQLLDDGNGENYVDY ADGM 1608 

Qy 157 FHQEQQI FI EAKASKI FNCRMEWEKVERGRRT SLCTH DP 195 

::|| : : : : | | : : : | | || | || 

Db 1609 EEIF GSLNSLKLE I EQMKRPLGTQQNPARTCKDLQLC - HPDFPDGE YWVDP 1658 

Qy 196 AKI CSRDHAQSSATWSCSQPFKWCVYI AFYST 228 

: I I I I III I : I II 

Db 1659 NQGCSRD SFKVYCNFTAGGST 1679 

RESULT 3 
S18803 

collagen alpha 1 (V) chain - hamster 

C; Species: Cricetinae gen. sp. (hamster) 

C;Date: 19-Mar-1997 #sequence_revision 24-Jul-1997 #text_change 16-Dec-1998 
C; Access ion: SI 8 8 03 

R;Greenspan, D.S.; Cheng, W. ; Hoffman, G.G. 
J. Biol. Chem. 266, 24727-24733, 1991 

A;Title: The pro-alphal (V) collagen chain. Complete primary structure, 
distribution of expression, and comparison with the pro-alphal (XI ) collagen 
chain. 

A/Reference number: S18802; MUID: 92105142 ; PMID: 1722213 
A /Accession: S18803 
A; Status : preliminary 
A /Molecule type: mRNA 
A;ResidueS: 1-1843 <GRE> 

C;Superfamily : collagen alpha 1 (V) chain; fibrillar collagen carboxyl -terminal 
homology 

F; 1620- 1842/Domain : fibrillar collagen carboxyl -terminal homology <FCC> 

Query Match 6.8%; Score 94; DB 2; Length 1843; 

Best Local Similarity 24.0%; Pred. No. 12; 

Matches 60; Conservative 24; Mismatches 92; Indels 74; Gaps 10; 

DGPPGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGE 74 

Mill I I II : I : I I : I I I I 



Qy 


25 


Db 


1463 


Qy 


75 


Db 


1512 


Qy 


120 


Db 


1569 


Qy 


180 


Db 


1628 


Qy 


219 


Db 


1675 



-AWGILGQ PPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVT 119 

III III II I I : | | : : 

3EQGITGPSGPLGPPGPPGLPGPPGPKGAK GSSGPTGPKGEAGHPGLP 1568 



| : | : || | : | | : = :| = = ::| 



EKVERGRRT SLCTH DP AKI CSRDHAQSSATWSCSQPFKV 218 

I I Ml || : M I I III 
EQMKRPLGTQQNPARTCKDLQLC -H PDF PDGEYVJVDPNQGCSRD SFKV 1674 



hi I 



RESULT 4 



VCVWEK 

env polyprotein - AKV murine leukemia virus 

N; Contains: knob protein gp76; R protein; spike protein p!5E 
C; Species: AKV murine leukemia virus 

C;Date: 05-Apr-1983 #sequence_revision 03-Aug-1984 #text_change 16-Jul-1999 
C;Accession: A92995; A93448; A03984 
R;Herr, w. 

J . Virol. 49, 471-478, 1984 

A;Title: Nucleotide sequence of AKV murine leukemia virus. 

A;Reference number: A92995; MUID: 84115072 ; PMID.-6319746 

A; Access ion: A92995 

A; Molecule type: genomic RNA 

A/Residues : 1-669 <HER> 

A; Cross-references: GB:J01998; GB:J01999; GB:K00016; GB:K00017; GB:K00018; 

GB:K01394; NID : g33 1993 ; PIDN : AAB03092 . 1 ; PID:g331996 

R;Herr, W. ; Corbin, V.; Gilbert, W. 

Nucleic Acids Res. 10, 6931-6944, 1982 

A;Title: Nucleotide sequence of the 3' half of AKV. 

A/Reference number: A93448; MUID: 83090450 ; PMID: 6294621 

A; Access ion: A9344 8 

A; Molecule type: DNA 

A; Residues: 1-34, 'R', 36-4 62, 1 K' , 4 64 -591 , ' K' ,593-669 <HE2> 
C;Genetics : 
A; Gene: env 

C;Superfamily : type C retrovirus env polyprotein 

C; Keywords: coat protein; glycoprotein; polyprotein; spike protein; 
transmembrane protein 

F; 1-31/Domain: signal sequence #status predicted <SIG> 
F;32-470/Product : knob protein gp76 #status predicted <KNB> 
F;471-650/Product : spike protein pl5E #status predicted <SPK> 
F; 65 1-66 9 /Product : R protein #status predicted <RPT> 

F; 43, 199,327,359, 399/Binding site: carbohydrate (Asn) (covalent) #status 
predicted 

Query Match 6.7%; Score 92.5; DB 1; Length 669; 

Best Local Similarity 23.8%; Pred. No. 4.6; 

Matches 62; Conservative 24; Mismatches 78; Indels 97; Gaps 15; 
Qy 24 DDGP--PGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQ 81 

.III! Ml I || I I | 

Db 261 DSGPRVPIGPNPVLSDRRPPSRPRPTR SPPPSNST PTET 299 

Qy 82 P PNRPNHS PP PSAKVKKI FGWGDF YSN I KTV- - ALNL LVTGKIVDHG- 126 

I I llh : I I I I I Ihl I 

Db 3 00 PLTLP- -EPPPAGVENRLL NLVKGAYQALNLTSPDKTQECWLCLVSGPPYYEGV 351 

Qy 127 - -NGTFSVH FQH NATGQGNI SISLVPPSKAVEFHQEQQI FI EAKA 169 

Ihl I II I I I I : I II : I : h 
Db 352 AVLGTYSNHTSAPANCSVASQHKLTLSEVTGQG-LCIGAVPKTHQVLCNTTQK 4 03 

Qy 170 SKIFNCRMEWEKVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPF-KWCVYIAFYST 228 

11= : I : I I I I : I :| 

Db 4 04 TSDGSYYLA APTGTTWACSTGLTPCISTTILDLTT 438 

Qy 229 DYRLVQKVCPDYNYHSDTPYY 24 9 

II - - I III = I 

Db 439 DYCVLVELWPRVTYHSPSYVY 459 



RESULT 5 
D40750 

proline-rich protein PRB1/2S (EA) - human (fragment) 
C;Species: Homo sapiens (man) 

C;Date: 19-May-1994 #sequence__revision 19-May-1994 #text_change 03-May-1996 
C; Accession: D40750 

R;Azen, E.A.; Latreille, P.; Niece, R.L. 
Am. J. Hum. Genet. 53, 264-278, 1993 

A; Title: PRBI gene variants coding for length and null polymorphisms among human 

salivary Ps, PmF, PmS, and Pe proline-rich proteins (PRPs) . 

A;Reference number: A40750; MUID : 93304421 ; PMID: 8317492 

A /Access ion: D4 0750 

A; Status : preliminary 

A; Molecule type: DNA 

A/Residues : 1-117 <AZE> 

A; Cross-references : GB:S62930 

C;Superfamily : proline-rich protein 

Query Match 6.6%; Score 91; DB 2; Length 117; 

Best Local Similarity 31.0%; Pred. No. 0.75; 

Matches 22; Conservative 7; Mismatches 32; Indels 10; Gaps 1; 

Qy 22 GQDDGPPGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQ 81 

I III II I =| h | III | | 

Db 41 GKPQGPPPQGDKSRSPRSPPGKPQGPPPQGGNQPQGPP SPPGKPQGPPPQ 90 

Qy 82' PPNRPNHSPPP 92 

I I I I II 
Db 91 GGNRPQGPPPP 101 



RESULT 6 
WJHU3E 

homeotic protein Hox C4 - human 

N; Alternate names: homeotic protein cpl9; homeotic protein cp8; homeotic protein 
Hox 3E 

C; Species: Homo sapiens (man) 

C;Date: 30-Sep-1991 #sequence_revision 30-Sep-1991 #text_change 22-Jun-1999 
C;Accession: S01030; S15545 

R;Simeone, A.; Pannese, M. ; Acampora, D. ; d' Esposito, M. ; Boncinelli, E. 
Nucleic Acids Res. 16, 5379-5390, 1988 

A; Title: At least three human homeoboxes on chromosome 12 belong to the same 
transcription unit. 

A;Reference number: S01030; MUID: 88262550 ; PMID:2898768 
A;Accession: S01030 
A; Molecule type: mRNA 
A;Residues: 1-264 <SIM> 

A; Cross-references: EMBL:X07495; NID:g32385; PIDN: CAA30376 . 1 ; PID:g32386 
A; Note: the sequence from Fig. 4 is inconsistent with that from Fig. 3 in 
lacking 108-Ala and 109-Ser and in having 259-Gly 

R;Boncinelli, E. ; Acampora, D. ; Pannese, M. ; d'Esposito, M. ; Somma, R. ; Gaudino, 
G.; Stornaiuolo, A.; Cafiero, M.; Faiella, A.; Simeone, A. 
Genome 31, 745-756, 1989 

A; Title: Organization of human class I homeobox genes. 
A;Reference number: S15036; MUID: 90215256 ,- PMID:2576652 



A /Access ion: SI 5 54 5 

A;Molecule type: DNA 

A;ResidueS: 156-221 <BON> 

C;Genetics: 

A; Gene: GDB:HOXC4 

A; Cross-references: GDB: 120672; OMIM: 142974 
A ; Map position: 12ql3 . 3 -12ql3 . 3 
C; Function: 

A; Description :. control of embryonic development by tissue- and stage-specific 
regulation of transcription 

C;Superfamily : homeotic protein Hox D4 ,- homeobox homology 

C;Keywords: alternative splicing; DNA binding; embryo; homeobox; nucleus; 

transcription regulation 

F; 157 -2 13 /Domain : homeobox homology <HOX> 

Query Match 6.5%; Score 90.5; DB 1; Length 264; 

Best Local Similarity 22.2%; Pred. No. 2.2; 

Matches 47; Conservative 25; Mismatches 91; Indels 49; Gaps 9; 

Qy 27 PPGSEDPERD DHEGQPRPRVPRKRGHI SP - KSRPMAN - STLLGLLAPPGEAWGI 78 

II III I I I : II I Ih : : I I I I I 

Db 57 PPRPSYPERQYSCTSLQGPGNSRGHGPAQAGHHHPEKSQSLCEPAPLSGASASPSPAPPA 116 

Qy 79 LGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTGKI VDHGNGTFS 131 

II hi = = | :: | I : I I 

Db 117 CSQP - - APDHPSSAAS KQP I VYPW MKKI HVSTVNPNYNGGEPKRSRAAYTRQQVLEL 171 

Qy 132 VHFQHNATGQGNI S I SLVPPSKAVEFHQEQQI FI EAKASKI F - -NCRMEWEKVERGR 186 

h I : I h : : : ||: | ||:|:| | 

Db 172 EKEFHYNRYLTRRRRI EIA HSLCLSERQI KIWFQNRRMKWKKDHRLP 218 

Qy 187 RTSLCTHDPAKIC SRDHAQSS 2 07 

I = : II I Ihlh 

Db 219 NTKVRSAPPAGAAPSTLSAATPGTSEDHSQSA 250 

RESULT 7 
JG0183 

myosin Myok - Dictyostel ium discoideum 
C;Species: Dictyostelium discoideum 

C;Date: 23- Jul -1999 #sequence_revision 23 -Jul -1999 #text_change 17-May-2002 

C; Access ion: JG0183 

R;Yazu, M. ; Adachi , H. ; Sutoh, K. 

Biochem. Biophys. Res. Commun. 255, 711-716, 1999 

A; Title: Novel Dictyostelium unconventional myosin Myok is a class I myosin with 

the longest loop-1 insert and the shortest tail. 

A;Reference number: JG0183; MUID: 99160418 ; PMID : 1004 9776 

A; Access ion: JG0183 

A; Status : preliminary 

A;Molecule type: DNA 

A;ResidueS: 1-858 <YAZ> 

A; Cross-references : DDBJ :AB017909 

C; Superfamily : myosin heavy chain; myosin motor domain homology 

F; 10-807/ Doma in : myosin motor domain homology ttstatus atypical <MMO> 



Query Match 6.5%; Score 90.5; DB 2; Length 858; 

Best Local Similarity 23.7%; Pred. No. 9.2; 



Matches 46; Conservative 18; Mismatches 77; Indels 53; Gaps 10; 



Qy 26 GPP GSEDPERDDHEGQPRPRVPRKRGHISPKSR PMANSTLLGLLAP PGEAWG 77 

III I II I I I :|: II II hi I -III 

Db 176 GPPSRGGGPPPTRG- -RGGPPPPI PQNRGAPPPVSNGGAPPPVAR GPVAPPPTR-- 227 

Qy 78 ILGQPPNR PNHS P P PSAKVKKI FGWGDF YSN I KTV AL 114 

I II I III I I I = = III 

Db 228 - -GAPPTRGGGPANRGGRGGGPPP VSTSRGGGGYGGSSKTVDVEHI KKVILDSNPLM 282 

Qy 115 NLL VTGKI VDHGNGT FSVHFQHNATGQGNISISLVPPSKAVEFHQEQQ IFI 165 

: | | : | : : | | | : : : | | |: : || 

Db 283 EAIGNAKTVRNDNSSRFGKYLEIQFDDNNAPVGGLISTFLLEKTRVTFQQKNERNFHIFY 342 

Qy 166 EAKASKI FNCRMEW 179 

= M 

Db 343 QMLGGLDQTTKSEW 356 

RESULT 8 
T02885 

peroxisome prolif erator-activated receptor gamma binding protein PBP165 - mouse 
C;Species: Mus musculus (house mouse) 

C;Date: 24-Mar-1999 #sequence_revision 24-Mar-1999 #text_change 08-Oct-1999 
C; Accession: T02 885 

R;Zhu, Y.; Qi , C. ; Jain, S.; Rao, M.S.; Reddy, J.K. 
J. Biol. Chem. 272, 25500-25506, 1997 

A;Title: Isolation and characterization of PBP, a protein that interacts with 
peroxisome prolif erator-activated receptor. 
A;Reference number: Z14760; MUID: 97467333 ; PMID: 9325263 
A; Accession: T02 885 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type: mRNA 
A ; Residues: 1-1560 <ZHU> 

A; Cross-references: EMBL: AF000294 ; NID:g3411010; PIDN:AAC31118 . 1; PID:g3411011 

Query Match 6.5%; Score 90; DB 2; Length 1560; 

Best Local Similarity 22.2%; Pred. No. 21; 

Matches 42; Conservative 29; Mismatches 70; Indels 48; Gaps 9; 

Qy 35 RDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQPPNRPNHSPPPSA 94 

:| II II h I I I I I 

Db 568 KDRHESV GHGEDFSKVSQNP I LTSLLQ I TGNGGSTI GSS PTP PHHTPP PVS 618 

Qy 95 KVKKI FGWGDFYSNIKT- -VALNLLVTGKIVDHGN- -GTFSVHFQHNATGQGNISI 146 

I I : :||| I h = h:: = | : = 

Db 619 SMA GNTKNHPMLMNLLKDNPAQDFSTLYGSSPLERQNSSSGSPRMEMCSGS 669 

Qy 147 SLVPPSKAVEFHQEQQIF 1 EAKASKI FNCRMEWEKVERGRRT- - 188 

I III I II : I =: = = : :|: | : :: | 

Db 670 NKAKKKKSSRVPPDKPK- -HQTEDDFQRELFSMDVDSQ-NPMFDVSMTADALDTPHITPA 726 

Qy 18 9 -SLCTHDPA 196 

I I II 
Db 727 PSQCSTPPA 735 



RESULT 9 
A46511 

envelope protein - AKV murine leukemia virus 
C; Species: AKV murine leukemia virus 

C;Date: 18-Jun-1993 #sequence_revision 25-Apr-1997 #text_change 30-May-1997 
C ; Acces s ion : A4 6 5 1 1 

R;Hayashi, H. ; Matsubara, H. ; Yokota, T. ; Kuwabara, I.; Kanno, M . ; Koseki, H.; 
Isono, K. ; Asano, T. ; Taniguchi, M. 
J. Immunol. 149, 1223-1229, 1992 

A; Title: Molecular cloning and characterization of the gene encoding mouse 

melanoma antigen by cDNA library transf ection. 

A/Reference number: A46511; MUID: 92364323 ; PMID:1380036 

A; Access ion: A4 6511 

A; Status : preliminary 

A; Molecule type: mRNA 

A;Residues: 1-669 <HAY> 

A;Note: sequence inconsistent with the nucleotide translation 

A;Note: sequence extracted from NCBI backbone (NCBIN : 110845 , NCBIP: 110846) 

C; Super family : type C retrovirus env polyprotein 

Query Match 6.5%; Score 89.5; DB 2; Length 669; 

Best Local Similarity 23.8%; Pred. No. 8.3; 

Matches 62; Conservative 23; Mismatches 79; Indels 97; Gaps 15; 
Qy 24 DDGP - - PGS EDPERDDHEGQPRPRVPRKRGH I S PKSRPMANSTLLGLLAPPGEAWG I LGQ 81 

I II I =1 I III I II II | i 

Db 261 DSGPRVPIGPNPVLSDRRPPSRPRPTR SPPPSNST PTET 299 



Qy 82 PPNRPNHSPPPSAKVKKIFGWGDFYSNIKTV- -ALNL LVTGKI VDHG- 126 

I I Mh - : = I I I I I 11 = 1 I 

Db 300 PLTLP- -EPPPAGVENRLL NLVKGAYQALNLTS PDKTQECWLCLVSGP PYYEGV 351 

Qy 127 - -NGTFSVH --FQH NATGQGNI SI SLVPPSKAVEFHQEQQI FI EAKA 169 

Ihl I II 1 1 1 1 i I 

Db 352 AVTGTYSNHTSAPANCSVASQHKLTLSEVTGQG-LCIGAVPKTHQVLCNTTQK 4 03 

Qy 170 SKIFNCRMEWEKVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPF-KWCVYIAFYST 228 

11= I : . I I : I I = I U 

Db 4 04 TSDGSYYLV APTGTTWACSTGLTPCI STTI LNLTT 438 

Qy 229 DYRLVQKVCPDYNYHSDTPYY 24 9 

II := == I III : I 

Db 439 DYCVLVELW PRVTYHSPSYVY 459 



RESULT 10 
D86245 

hypothetical protein [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 31-Mar-2001 
C; Access ion: D86245 

R;Theologis,. A. ; Ecker, J.R.; Palm, C.J.; Federspiel, N.A.; Kaul, S.; White, 0. ; 
Alonso, J.; Altaf, H. ; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H. ; Cheuk, R.F.; Chin, C.W. ; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V. ; Feng, J. ; Fong, B. ; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 



Search completed: June 14, 2004, 19:01:36 
Job time : 62 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: 



June 14, 2004, 19:00:26 ; Search time 22 Seconds 

(without alignments) 
591.352 Million cell updates/sec 



Perfect score: 
Sequence: 

Scoring table: 



1386 

1 MQLTRCCFVFLVQGSLYLVI VQKVCPDYNYHSDTPYYPSG 252 



BLOSUM62 
Gapop 10.0 



389414 



Gapext 0 . 5 

Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_AA: * 

1: /cgn2_6/ptodata/2/iaa/5A__COMB.pep:* 

2 : /cgn2_6/ptodata/2/iaa/5B_COMB.pep: * 

3 : /cgn2_6/ptodata/2/iaa/6A_COMB.pep: * 

4 : /cgn2_6/ptodata/2/iaa/6B_COMB.pep: * 

5 : /cgn2_6/ptodata/2/iaa/PCTUS_COMB . pep : * 

6 : /cgn2_6/ptodata/2/iaa/backf ilesl .pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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RESULT 1 

US-09-976-594-112 

; Sequence 112, Application US/0997659.4 

; Patent No. 6673549 

; GENERAL INFORMATION: 

; APPLICANT: Furness, Michael 

; APPLICANT: Buchbinder, Jenny 

; TITLE OF INVENTION: GENES EXPRESSED IN C3A LIVER CELL CULTURES TREATED WITH 
STEROIDS 

FILE REFERENCE: PA- 0041 US 
; CURRENT APPLICATION NUMBER: US/09/976,594 
; CURRENT FILING DATE: 2001-10-12 
; PRIOR APPLICATION NUMBER: 60/240,409 
; PRIOR FILING DATE: 2000-10-12 
; NUMBER OF SEQ ID NOS : 1143 

SOFTWARE: PERL Program 



; SEQ ID NO 112 

LENGTH: 252 

TYPE: PRT 
; ORGANISM: Homo sapiens 

FEATURE : 
; NAME /KEY : misc_f eature 

OTHER INFORMATION: Incyte ID No. 6673549 3070147CD1 
US-09-976-594-112 

Query Match 99.4%; Score 1378; DB 4; Length 252; 

Best Local Similarity 99.6%; Pred. No. 5.2e-149; 

Matches 251; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 
Qy 1 MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 60 

M 1 1 1 1 1 1 1 M 1 1 1 1 1 1 1 1 II I II 1 1 1 1 1 II I II I II 1 1 1 1 1 1 1 1 1 1 II I II I II M 1 1 1 

Db 1 MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 60 

Qy 61 ANSTLLGLLAP PGEAWG I LGQP PNRPNHS P PPSAKVKKI FGWGDFYSNI KTVALNLLVTG 12 0 

I II I I I I II II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 61 ANSTLLGLLAPTGEAWGILGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTG 120 

Qy 121 KI VDHGNGTFSVHFQHNATGQGNI S I SLVPPSKAVEFHQEQQI FI EAKASKI FNCRMEWE 180 

I I M II I II 1 1 1 1 1 II I II II I II 1 1 1 1 1 II I II 1 1 II I II I II 1 1 1 II 1 1 1 1 II M I II 

Db 121 KI VDHGNGTFS VHFQHNATGQGN I S I SLVP PSKAVEFHQEQQI F I EAKASKI FNCRMEWE 180 

Qy 181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I 

Db 181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 

Qy 241 NYHSDTPYYPSG 252 

llllllllllll 
Db 241 NYHSDTPYYPSG 252 



RESULT 2 
US-09-110-517-2 

; Sequence 2, Application US/09110517A 

; Patent No. 6248520 

; GENERAL INFORMATION: 

; APPLICANT: Roeder, Robert G 

; APPLICANT: Fondell, Joseph D 

; APPLICANT: Yuan, Chao X 

; • APPLICANT: I to, Mitsuhiro 

; TITLE OF INVENTION: NUCLEIC ACID MOLECULES ENCODING NUCLEAR HORMONE 
; TITLE OF INVENTION: RECEPTOR COACTIVATORS AND USES THEREOF 
; FILE REFERENCE: 600-1-224 

; CURRENT APPLICATION NUMBER: US/09/110 , 517A 
; CURRENT FILING DATE: 1998-07-06 
; NUMBER OF SEQ ID NOS : 51 
; SOFTWARE: Patentln Ver. 2.0 
; SEQ ID NO 2 

LENGTH: 1581 

TYPE: PRT 

ORGANISM: Homo sapiens 
US-09-110-517-2 



Query Match 



6.6%; Score 92; DB 3; Length 1581; 



Best Local Similarity 23.0%; Pred. No. 0.68; 

Matches 43; Conservative 26; Mismatches 72; Indels 46; Gaps 9; 

RDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQPPNRPNHSPPPSA 94 
I || II h I I II I II I : I : II I : 



I I : :||| I h = | = : = = | = I 
GNTKNHPMLMNLLKDNPAQDFSTLYGSSPLERQNSSSGSPRMEICSGS 684 

-SLVPPSKAVEFHQEQQIF 1 EAKASK- 1 FNCRMEWEKVERGRRT- - - 188 

I II I II : I :: : | ||: | : :: | 



Qy 


35 


Db 


583 


Qy 


95 


Db 


634 


Qy 


147 


Db 


685 


Qy 


189 


Db 


743 



RESULT 3 

US-09-252-991A-21090 

Sequence 21090, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 21090 
LENGTH: 125 
TYPE: PRT 

ORGANISM: Pseudomonas aeruginosa 
FEATURE : 

NAME /KEY : UNSURE 
LOCATION: (79) 

OTHER INFORMATION: Identity of amino acid at the above locations are 
unknown . 

US-09-252-991A-21090 

Query Match 6.6%; Score 91; DB 4; Length 125; 

Best Local Similarity 28.2%; Pred. No. 0.019; 

Matches 35; Conservative 19; Mismatches 44; Indels 26; Gaps 8; 

Qy 33 PERDDHEGQPRPRVP RKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQPPNRPNH 88 

III II Ihl I hill I III I 
Db 1 PERPDH-AQPQPHFPFCPARRRGH GRRRAARAWRARSAAIGE PQR 45 



Qy 



89 S PPPSAKVKKI FGWGDFYSNI KTVALNLLVTGKI VD HGNGTFS - VHFQHNATGQGNI 144 



: I : I :| I I III hi : I II- :| - || = 

Db 46 AAPATGIMETIRNYG- -YD I IMLVAL-LWASMFIGXWYHAYGTYAEIHTGRKTWGQFGL 102 

Qy 145 SISL 148 

Db 103 TVAI 106 



RESULT 4 

US- 09-252 -991A-24567 

Sequence 24567, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 24567 
LENGTH: 681 
TYPE: PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-24567 

Query Match 6.4%; Score 88.5; DB 4; Length 681; 

Best Local Similarity 26.8%; Pred. No. 0.48; 

Matches 33; Conservative 10; Mismatches 31; Indels 49; Gaps 6; 

Qy 22 GQDDGPPGSEDPERDDHEGQPR PRVPRKRGHISPKSR P 59 

I Mill :| I I I I h II I | | 

Db 24 0 GRGDGPPGEPGADRLDSGGGPAHAPAAPTPPRLHRARQSLAPRPRRIPPRSRRGPAAPLP 299 

Qy 60 MAN STLLGLLAPPGEAWG--IL GQP PNRPNHSPPP 92 

III ill II I II III I 

Db 300 LRDPAGAPGRRPRLETLAGAAAPPRRAGGGRVLRRSRRGSGLPCLRPVADPTLPATCPAP 359 

Qy 93 SAK 95 

h 

Db 3 60 GAR 3 62 



RESULT 5 
US-09-187-331-5 

; Sequence 5, Application US/09187331 

; Patent No. 6043056 

; GENERAL INFORMATION: 

; APPLICANT: Yue , Henry 

; APPLICANT: Corley, Neil C. 

; APPLICANT: Guegler, Karl J. 

; APPLICANT: Gorgone, Gina A. 



APPLICANT: Baughn, Mariah R. 

TITLE OF INVENTION: CELL SURFACE GLYCOPROTEINS 
FILE REFERENCE: PF-0631 US 

CURRENT APPLICATION NUMBER: US/09/187 , 331 
CURRENT FILING DATE: 1998-11-06 
NUMBER OF SEQ ID NOS : 6 
SOFTWARE: PERL Program 
SEQ ID NO 5 
LENGTH: 180 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : - 

OTHER INFORMATION: g2499136 
US-09-187-331-5 

Query Match 6.3%; Score 87.5; DB 3; Length 180; 

Best Local Similarity 20.8%; Pred. No. 0,082; 

Matches 44; Conservative 26; Mismatches 77; Indels 65; Gaps 9; 

Qy 7 CFVFLVQGSLYLVICGQDDGPPGSEDPERDDHEG QPRPRVPRKRGHISPKSRPM- 60 

Ih :| : | | :: | | : |:| | |:| |= :| 

Db 14 CFLMHARGQRDFDLADALDDPEPTKKPNSDI YPKPKPPYYPQPENPDSGGNI YPRPKPRP 73 

Qy 61 ANSTLLGLLAPPGEAWGIL GQPPNRPNHSPPPSAKVKKIFGWGDFYSNI 109 

II I : I h I II II I I Ih 

Db 74 QPQPGNS GNSGGYFNDVDRDDGRYPPRPRPRPPAG GGGGGYSS- 116 

Qy 110 KTVALNLLVTGKI VDHGNGTFS VHFQHNAT GQGN I S I SLVP P S KAVEFHQEQQ I F I E 166 

:|| : |::| :||: :| | =| 
Db 117 YGNSDNTHGGDHHSTYGNPEGNM VAKI VS P I VS VW VTLL 156 

Qy 167 AKASKI FNCRMEWEKVERGRRTSLCTHDPAKI 198 

hi II I hi : 

Db 157 GAAASYFKL NNRRNCFRTHEPENV 18 0 



RESULT 6 
US-09-470-946-5 

; Sequence 5, Application US/09470946 

; Patent No. 6358 923 

; GENERAL INFORMATION: 

; APPLICANT: Yue, Henry 

; APPLICANT: Corley, Neil C. 

; APPLICANT: Guegler, Karl J. 

; APPLICANT: Gorgone, Gina A. 

APPLICANT: Baughn, Mariah R. 
; TITLE OF INVENTION: CELL SURFACE GLYCOPROTEINS 
; FILE REFERENCE: PF-0631 US 

; CURRENT APPLICATION NUMBER: US/09/470 , 946 

; CURRENT FILING DATE: 1999-12-22 

; EARLIER APPLICATION NUMBER: US 09/187,331 

; EARLIER FILING DATE: 1998-11-06 

; NUMBER OF SEQ ID NOS: 6 

SOFTWARE: PERL Program 
; SEQ ID NO 5 

LENGTH: 18 0 

TYPE: PRT 



; ORGANISM: Homo sapiens 
FEATURE : - 

OTHER INFORMATION: g2499136 
US-09-470-946-5 



Query Match 6.3%; Score 87.5; DB 4; Length 180; 

Best Local Similarity 20.8%; Pred. No. 0.082; 

Matches 44; Conservative 26; Mismatches 77; Indels 65; Gaps 9; 

Qy 7 CFVFLVQGSLYLVICGQDDGPPGSEDPERDDHEG QPRPRVPRKRGHISPKSRPM- 60 

Ih =| : I I =: I I : hi I hi h I 

Db 14 CFLMHARGQRDFDLADALDDPEPTKKPNSDIYPKPKPPYYPQPENPDSGGNI YPRPKPRP 73 

Qy 61 ANSTLLGLLAPPGEAWGIL GQPPNRPNHSPPPSAKVKKI FGWGDFYSNI 109 

I I I : I h I II II II Ih 

Db 74 QPQPGNS GNSGGYFNDVDRDDGRYPPRPRPRPPAG GGGGGYSS- 116 

Qy 110 KTVALNLLVTGKI VDHGNGTFSVHFQHNAT GQGNI SI SLVPPSKAVEFHQEQQI FIE 166 

:|| : |::| :||: :| | :| 
Db 117 YGNSDNTHGGDHHSTYGNPEGNMVAKI VS P I VS VW- VTLL 156 

Qy 167 AKASKI FNCRMEWEKVERGRRTSLCTHDPAKI 198 

h I II Ihl 

Db 157 GAAASYFKL NNRRNCFRTHE PENV 180 



RESULT 7 

US-09-252-991A-234 84 

Sequence 23484, Application US/09252991A 
Patent No. 6551795 
GENERAL INFORMATION: 
APPLICANT: Marc J. Rubenfield et al . 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 107196.136 

CURRENT APPLICATION NUMBER: US/09/252 , 991A 
CURRENT FILING DATE: 1999-02-18 
PRIOR APPLICATION NUMBER: US 60/074,788 
PRIOR FILING DATE: 1998-02-18 
PRIOR APPLICATION NUMBER: US 60/094,190 
PRIOR FILING DATE: 1998-07-27 
NUMBER OF SEQ ID NOS : 33142 
SEQ ID NO 23484 
LENGTH: 371 
TYPE: PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-23484 



Query Match 6.2%; Score 86; DB 4; Length 371; 

Best Local Similarity 23.3%; Pred. No. 0.37; 

Matches 44; Conservative 19; Mismatches 60; Indels 66; Gaps 9; 



Qy 25 DGPPGSEDPERDDHEGQPRP RVPRKRGH ISPKSRPMA 61 

III hi I I II I I :||| : h I 

Db 3 DGPDRPPRPDRPGHRVQGRPAYRRSPHRRGHRHHPRPGLRQGHRRQEGHPPLRPRLRAPR 62 



Qy 


62 


Db 


63 


n v 

wy 


ft 9 


Db 


123 


Qy 


139 


Db 


183 



iG LL- -APPGEA WGIL GQP PNRPNH 88 

III Ml II III 



IPPPSAKVKKI FGWGDFYSNI KTVALNLLVTGKI VDHGNGTF - S VHFQHNA 138 

II :h = I I =: = - I -hi I II 



I I 



RESULT 8 

US-09 -252 -991A-2 0793 

; Sequence 20793, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 

; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252 , 991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 20793 

LENGTH: 548 

TYPE : PRT 

ORGANISM: Pseudomonas aeruginosa 
US-09-252-991A-20793 

Query Match 6.2%; Score 86; DB 4; Length 548; 

Best Local Similarity 34.6%; Pred. No. 0.66; 

Matches 28; Conservative 6; Mismatches 35; Indels 12; Gaps 5; 

Qy 30 SEDPERDDHE GQPRPRVPRKRGHISP- -KSRPMANSTLL- -GLL- -APPGEAWGIL 79 

I I : I I I I Ihl I : II II II I I II 

Db 223 SRS PRQQRH PGGTGGDSR PGA PRRRQRAD P WRRR PH PG PALL PR PLL PGG P PAATGG I P 282 

Qy 8 0 GQPPNRPN- -HSPPPSAKVKK 98 

II II I I I :h: 
Db 283 RQPDGRPCQWHLPAPQQRVRR 303 



RESULT 9 
US-08-494-168-8 

; Sequence 8, Application US/08494168 
; Patent No. 5731192 
; GENERAL INFORMATION: 

APPLICANT: Reeders , Stephen T. 

APPLICANT: Zhou, Jing 

TITLE OF INVENTION: Collagen COL4A6 : Gene, Protein and Method 



TITLE OF INVENTION: of Detecting Collagen Deficiency 
NUMBER OF SEQUENCES: 10 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Foley & Lardner 
STREET: 3000 K Street, N.W. , Suite 500 
CITY: Washington, D.C. 
COUNTRY : USA 
ZIP: 20007-5109 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
OPERATING SYSTEM: PC-DOS/MS-DOS 
SOFTWARE: Patentln Release #1.0, Version #1.25 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/4 94 , 168 
FILING DATE: 
CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: US 08/112,465 
FILING DATE: 27-AUG-1993 
ATTORNEY/AGENT INFORMATION: 
NAME: SAXE, Bernhard D. 
REGISTRATION NUMBER: 28,665 
REFERENCE/DOCKET NUMBER: 4 03 97/104/BABR 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: ( 2 02 ) 672 - 53 00 
TELEFAX: ( 202 ) 672 - 53 99 
TELEX: 904136 
INFORMATION FOR SEQ ID NO: 8: 
SEQUENCE CHARACTERI STI CS : 
LENGTH: 54 9 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
US-08-494-168-8 

Query Match 6.2%; Score 85.5; DB 1; Length 54 9; 

Best Local Similarity 25.9%; Pred. No. 0.76; 

Matches 38; Conservative 11; Mismatches 57; Indels 41; Gaps 6; 

Qy 26 GPPGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQ 81 

I I I I : I : I I : II I hi III 1=11 

Db 157 GPPGPQGPKGQKGEPYALPKEERDRYRGEP GEPGLVGFQGPPGRP-GHVGQMGPV 210 

Qy 82 - PPNRPNHS PPPSAKVKKI FGWGDFYSNI KTVALNLLVTGKI VDHGNGTFSVHFQHNATG 14 0 

I II II I |: : | | : | : | 

Db 211 GAPGRPGPPGPPGPK GQQGNRGLGFYGVKGEKGDVG 24 6 

Qy 141 Q GNISISLVP- -PSKAVEFHQEQ 161 

I I hi I I II h 

Db 247 QPGPNGI PSDTLHPI IAPTGVTFHPDQ 273 



RESULT 10 

US-09-252-991A-27341 

; Sequence 27341, Application US/09252991A 
; Patent No. 6551795 
; GENERAL INFORMATION: 



; APPLICANT: Marc J. Rubenfield et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
PSEUDOMONAS 

; TITLE OF INVENTION: AERUGINOSA FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: 107196.136 

; CURRENT APPLICATION NUMBER: US/09/252,991A 

; CURRENT FILING DATE: 1999-02-18 

; PRIOR APPLICATION NUMBER: US 60/074,788 

; PRIOR FILING DATE: 1998-02-18 

; PRIOR APPLICATION NUMBER: US 60/094,190 

; PRIOR FILING DATE: 1998-07-27 

; NUMBER OF SEQ ID NOS : 33142 

; SEQ ID NO 27341 

LENGTH: 1073 

TYPE : PRT 
; ORGANISM: Pseudomonas aeruginosa 

FEATURE : 

NAME/ KEY: UNSURE 
LOCATION: (803) 

; OTHER INFORMATION: Identity of amino acid at the above locations are 
unknown . 

US-09-252-991A-27341 

Query Match 6.1%; Score 85; DB 4; Length 1073; 

Best Local Similarity 28.9%; Pred. No. 2.4; 

Matches 28; Conservative 6; Mismatches 43; Indels 20; Gaps 3; 

Qy 22 GQDDGPPGSED- -PERDDHEGQPRPRVPR KRGHI SPKSRPMANSTLLG 67 

I = I III I I I I I II I II h 
Db 524 GAEARPAGSSDRPPERDVAAADPHPRTGRYRGPGAAREHRGNRGGSCPRRSPVSAG 57 9 

Qy 68 LLAP PGEAWG I LGQPPNRPNHS P PPSAKVKKI FGWGD 104 

III III I h 'III 

Db 580 - -APPAPHPNIQENPNGRQQEQGPAVARCGALHGLGD 614 



RESULT 11 
US-09-396-149-6 

; Sequence 6, Application US/09396149 

; Patent No. 6538176 

; GENERAL INFORMATION: 

; APPLICANT: Mahajan, Pramod B. 

TITLE OF INVENTION: Maize Replication Protein A and Use 
; FILE REFERENCE: 5718-59 

; CURRENT APPLICATION NUMBER: US/09/396 , 149 
; CURRENT FILING DATE: 1999-09-15 
; NUMBER OF SEQ ID NOS: 22 

SOFTWARE: FastSEQ for Windows Version 3.0 
; SEQ ID NO 6 

LENGTH: 609 

TYPE : PRT 

ORGANISM: Xenopus laevis 
US-09-396-149-6 

Query Match 6.1%; Score 84.5; DB 4; Length 609; 

Best Local Similarity 23.2%; Pred. No. 1.2; 

Matches 43; Conservative 26; Mismatches 65; Indels 51; Gaps 10; 



Qy 2 6 GPPGSEDPERDDHEGQPRPRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQPPNR 85 

I h I I I I I : I I Ml I 

Db 110 GKIGNPQPYND GQPQPAAP APASAPAPAPSKL Q 142 

Qy 86 PNHSPPPSAK- -VKKIFGWGDFY SNIKTVALNLL VTGKI VDHGNGTF 130 

I = II II hill I I I = I I := : I 
Db 143 NNSAP P PSMNRGTS KLFGGGSLLNTPGGSQS KWP I ASLN P YQS KWTVRARVTNKG 198 

Qy 131 SVHFQHNATGQGNI -SI SLVPPS KAVEFH - QEQQI F I EAKASKI FNCRMEWEKVERG 185 

|: |:| : || :| | :| |: | : | : :|:: |: 
Db 199 QIRTWSNSRGEGKLFSIEMVDESGEIRATAFNEQADKFFSIIEVNKVYYFSKGTLKIANK 258 

Qy 186 RRTSL 190 

= Ih 

Db 259 QYTSV 263 



RESULT 12 
US-09-187-331-1 

; Sequence 1, Application US/09187331 

; Patent No. 6043056 

; GENERAL INFORMATION: 

; APPLICANT: Yue , Henry 

; APPLICANT: Corley, Neil C. 

; APPLICANT: Guegler, Karl J. 

; APPLICANT: Gorgone, Gina A. 

; APPLICANT: Baughn, Marian R. 

; TITLE OF INVENTION: CELL SURFACE GLYCOPROTEINS 
; FILE REFERENCE: PF-0631 US 

; CURRENT APPLICATION NUMBER: US/09/187,331 
; CURRENT FILING DATE: 1998-11-06 
; NUMBER OF SEQ ID NOS : 6 

SOFTWARE: PERL Program 
; SEQ ID NO 1 

LENGTH: 195 

TYPE: PRT 
; ORGANISM: Homo sapiens 

FEATURE : - 

OTHER INFORMATION: 22 97891 
US-09-187-331-1 • 



Query Match 6.1%; Score 84; DB 3; Length 195; 

Best Local Similarity 20.3%; Pred. No. 0.23; 

Matches 45; Conservative 27; Mismatches 80; Indels 70; Gaps 11; 

Qy 7 CFVFLVQGSLYLVICGQDDGPPGSEDPERDDHEG QPRPRVPRKRGHISPKSRPM- 60 

||: :| I I - I I : hi I hi h =| 

Db 14 CFLMHARGQRDFDLADALDDPEPTKKPNSDI YPKPKPPYYPQPENPDSGGNIYPRPKPRP 73 

Qy 61 ANSTLLGLLAPPGEAWGIL GQPPNRPNHSPPPSAKVKKI FGWGDF- -YS 107 

II I : I h I II II I I : I 

Db 74 QPQPGNS GNSGG YFNDVDRDDGR Y P PR PRPR P PAGG GGGGYSSYG 118 



Qy 108 NI KTVALNLLVTGKI VDHGNGTFSVHFQ HNAT GQGNISISLVPPSKAVE 156 

I II I : : |::| Uh = I I • I 

Db 119 NSDNT HGRGGYRPNSRYGNTYGGDHHSTYGNPEGNMVAKI VS P I VSW 166 



Qy 157 FHQEQQI FI EAKASKI FNCRMEWEKVERGRRTSLCTHDPAKI 198 

: : h I II I hi : 

Db 167 V VTLLGAAASYFKL NNRRNCFRTHEPENV 195 



RESULT 13 
US-09-470-946-1 

Sequence 1, Application US/09470946 
Patent No. 6358923 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Yue , Henry 
Corley, Neil C. 
Guegler, Karl J. 
Gorgone, Gina A. 
Baughn, Marian R. 
TITLE OF INVENTION: CELL SURFACE GLYCOPROTEINS 
FILE REFERENCE: PF-0631 US 

CURRENT APPLICATION NUMBER: US/09/470,946 
CURRENT FILING DATE: 1999-12.-22 
EARLIER APPLICATION NUMBER: US 09/187,331 
EARLIER FILING DATE: 1998-11-06 
NUMBER OF SEQ ID NOS : 6 
SOFTWARE: PERL Program 
SEQ ID NO 1 
LENGTH: 195 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : - 

OTHER INFORMATION: 22 97891 
US-09-470-946-1 

Query Match 6.1%; Score 84; DB 4; Length 195; 

Best Local Similarity 20.3%; Pred. No. 0.23; 

Matches 45; Conservative 27; Mismatches 80; Indels 70; Gaps 11; 

Qy 7 CFVFLVQGSLYLVICGQDDGPPGSEDPERDDHEG QPRPRVPRKRGHISPKSRPM- 60 

lh H = | | :: | | : |:| | |:| |: :| 

Db 14 CFLMHARGQRDFDLADALDDPEPTKKPNSDI YPKPKPP YYPQPENPDSGGNI YPRPKPRP 73 

Qy 61 ANSTLLGLLAPPGEAWGIL GQPPNRPNHSPPPSAKVKKI FGWGDF- - YS 107 

I I I : I h I II II I I : I 

Db 74 QPQPGNS GNSGG YFNDVDRDDGR Y P PR PR PRP PAGG GGGGYSSYG 118 

Qy 108 NIKTVALNLLVTGKIVDHGNGTFSVHFQ HNAT GQGN ISIS L VP P S KAVE 156 

I III::: h = | Uh : I I =| 

Db 119 NSDNT--' HGRGGYRPNSRYGNTYGGDHHSTYGNPEGNMVAKIVSPI VSW 166 

Qy 157 FHQEQQI FI EAKASKI FNCRMEWEKVERGRRTSLCTHDPAKI 198 

: : h I II III : 

Db 167 V VTLLGAAASYFKL NNRRNCFRTHEPENV 195 



RESULT 14 
US-08-484-126-3 

; Sequence 3, Application US/08484126 
; Patent No. 5985655 



; GENERAL INFORMATION: 

APPLICANT: Anderson, W. French 
APPLICANT: Baltrucki, Leon F. 
APPLICANT: Mason, James M. 

TITLE OF INVENTION: Targetable Vector Particles 
NUMBER OF SEQUENCES: 8 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Carella, Byrne, Bain, Gilfillan, 

ADDRESSEE: Cecchi , Stewart & Olstein 

STREET: 6 Becker Farm Road 

CITY: Roseland 

STATE: New Jersey 

COUNTRY: USA . 

ZIP: 07068 
COMPUTER READABLE FORM: 

MEDIUM TYPE: 3.5 inch diskette 

COMPUTER: IBM PS/2 

OPERATING SYSTEM: MS-DOS 

SOFTWARE: Word Perfect 5.1 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/08/484 , 126 

FILING DATE: 07-JUN-1995 

CLASSIFICATION: 435 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/326,347 

FILING DATE: 20-OCT-1994 

APPLICATION NUMBER: 08/973,307 

FILING DATE: 09-NOV-1992 
ATTORNEY/AGENT INFORMATION: 

NAME: Lillie, Raymond J. 

REGISTRATION NUMBER: 31,778 

REFERENCE/DOCKET NUMBER: 271010-281 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 201-994-1700 

TELEFAX: 201-994-1744 
; INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 453 amino acids 
; TYPE: amino acid 

STRANDEDNESS : 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
FEATURE : 

NAME/KEY: xenotropic gp70 protein 
US-08-484-126-3 

Query Match 6.1%; Score 84; DB 2; Length 453; 

Best Local Similarity 23.4%; Pred. No. 0.84; 

Matches 60; Conservative 25; Mismatches 69; Indels 102; Gaps 16; 

Qy 44 PRVPRKRGHISPKSRPMANSTLLGLLAPPGEAWGILGQPPNRPNH SPPPSAK 95 

I I I I II I = I I :| :|h I UNI 

Db 243 PRVP IGP NPVITDQLPPSQPVQIMLPRPPHPPPSGTVSMVPGAPPPSQQ 291 

Qy 96 VKKI FGWGDFYSNI KTVALNL LVTGKI VDHG NGTFSVH 133 

I II h I I I I Ihl I 11 = 1 I 

Db 292 P GTGDRLLNLVEGAYQALNLTSPDKTQECWLCLVSGPPYYEGVAVLGTYSNHTSAP 347 



Qy 134 FQH NATGQGNI S I SLVPPSKAVEFHQEQQI FI EAKASKI FNCRMEWEKV 182 

II MM : : II : II 
Db 348 ANCSVASQHKLTLSEVTGQG-LCVGAVPKT HQ 378 

Qy 183 ERGRRTSLC THDPAKICSRDHAQSSATWSCSQPF-KWCVYIAFYSTDYRLVQKV 236 

:|| | | : : | : |:|: : : :||| :: :: 

Db 379 ALCNTTQKTSDGSYYLA- - -APAGTIWACNTGLTPCLSTTVLNLTTDYCVLVEL 42 9 

Qy 237 CPDYNYHSDTPYYPSG 252 

I III I I I 
Db 43 0 WPKVTYHS - - PDYVYG 443 



RESULT 15 
US-09-374-909-3 

; Sequence 3, Application US/09374909 
; Patent No. 6503501 

GENERAL INFORMATION: 

APPLICANT: Anderson, W. French 
; Baltrucki, Leon F. 

Mason, James M. 
TITLE OF INVENTION: Targetable Vector Particles 
NUMBER OF SEQUENCES : 8 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: Carella, Byrne, Bain, Gilfillan, 
Cecchi, Stewart & Olstein 

STREET: 6 Becker Farm Road 

CITY: Roseland 

STATE: New Jersey 

COUNTRY: USA 

ZIP: 07068 
COMPUTER READABLE FORM: 

MEDIUM TYPE: 3.5 inch diskette 

COMPUTER: IBM PS/2 

OPERATING SYSTEM: MS-DOS 

SOFTWARE: Word Perfect 5.1 
CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/374 , 909 

FILING DATE: 13 -Aug- 199 9 

CLASSIFICATION: <Unknown> 
PRIOR APPLICATION DATA: 

APPLICATION NUMBER: 08/484,126 

FILING DATE: <Unknown> 

APPLICATION NUMBER: 08/973,307 

FILING DATE: 09-NOV-1992 
ATTORNEY/AGENT INFORMATION: 

NAME: Lillie, Raymond J. 

REGISTRATION NUMBER: 31,778 

REFERENCE/DOCKET NUMBER: 271010-281 
TELECOMMUNICATION INFORMATION: 

TELEPHONE: 201-994-1700 

TELEFAX: 201-994-1744 
INFORMATION FOR SEQ ID NO: 3: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 4 53 amino acids 

TYPE: amino acid 



Perfect score: 1386 

Sequence: 1 MQLTRCCFVFLVQGSLYLVI VQKVCPDYNYHSDTPYYPSG 252 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0 . 5 

Searched: 6019581 seqs, 976053577 residues 

Total number of hits satisfying chosen parameters: 6019581 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Pending_Patents_AA_Main: * 



Perfect score: 1386 

Sequence: 1 MQLTRCCFVFLVQGSLYLVI VQKVCPDYNYHSDTPYYPSG 252 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 288743 seqs, 43614698 residues 

Total number of hits satisfying chosen parameters: 288743 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post -processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Pending_Patents_AA_New: * 



Qy 


7 


C F VFLVQGS L YLVI CGQDDG P PGS E D P ERDDH EG QPRPRVPRKRGHI SPKSRPM- 

lh :| 1 1 - 1 1 = hi 1 hi h = 1 
CFLMHARGQRDFDLADALDDPEPTKKPNSDI YPKPKPPYYPQPENPDSGGNI YPRPKPRP 


60 


Db 


14 


73 


Qy 


61 


ANSTLLGLLAPPGEAWGIL GQPPNRPNHSPPPSAKVKKI FGWGDFYSNI 

II I :. I h 1 II II II 11= 
QPQPGNS GNSGGYFNDVDRDDGRYPPRPRPRPPAG GGGGGYSS- 


109 


Db 


74 


116 






XV 1 ViA jjlM Xj-U V luJM V JJnVjiNVj IT O V ilF yrll\Ii-i 1 o^VJiN lOl OijVrrO/ViVCir fl^Ej^y X r Id 

:|| : |::| :||: :| | :| 
YGNSDNTHGGDHHSTYGNP EGNMVAKI VS P I VSVW VTLL 


± O D 


Db 


117 


156 


Qy 


167 


AKASKI FNCRMEWEKVERGRRTSLCTHDPAKI 198 

h 1 II Ihl 
GAAASYFKL NNRRNCFRTHE P ENV 180 




Db 


157 





Search completed: June 14, 2004, 19:03:57 
Job time : 41 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: June 14, 2004, 18:55:36 ; Search , time 18 Seconds 

(without alignments) 
728.982 Million cell updates/sec 

Title: US-10-063-599-92 
Perfect score: 1386 

Sequence: 1 MQLTRCCFVFLVQGSLYLVI VQKVCPDYNYHSDTPYYPSG 252 

Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 141681 seqs, 52070155 residues 

Total number of hits satisfying chosen parameters: 141681 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : SwissProt_42 : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 

% 
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ALIGNMENTS 



RESULT 1 
NXP3_HUMAN 

ID NXP3_HUMAN STANDARD; PRT; 252 AA. 

AC 095157; Q8TBF6; Q9ULR1 ; 

DT 16-OCT-2001 (Rel . 40, Created) 



DT 28-FEB-2003 (Rel . 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Neurexophilin 3 precursor. 

GN NXPH3 OR NPH3 OR KIAA1159. 

OS Homo sapiens (Human) . 

OC Eukaryota,- Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI JTaxID=9606 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Hypothalamus; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L. , Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L . , Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T., Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L. , Marusina K. , Farmer A. A. , Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A., 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A., 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W. , Touchman J.W. , Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M. , 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U. , Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A.; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [2] 

RP SEQUENCE OF 32-252 FROM N.A. 

RC TISSUE=Brain; 

RX MEDLINE=2 0039618; PubMed=l 05744 6 1 ; 

RA Hirosawa M. , Nagase T. , Ishikawa K.-I., Kikuno R. , Nomura N. , 

RA Ohara 0 . ; 

RT "Characterization of cDNA clones selected by the GeneMark analysis 

RT from size-fractionated cDNA libraries from human brain."; 

RL DNA Res. 6:329-336(1999). 

RN [3] 

RP SEQUENCE OF 80-252 FROM N.A. 

RX MEDLINE=98237742; PubMed=9570794 ; 

RA Missler M., Suedhof T.C.; 

RT "Neurexophilins form a conserved family of neuropeptide -like 

RT glycoproteins . " ; 

RL J. Mol. Neurosci. 18:3630-3638(1998). 

CC -!- FUNCTION: May be signaling molecules that resemble neuropeptides. 
CC Ligand for alpha -neurexins (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Secreted (Potential). 

CC -!- TISSUE SPECIFICITY: Highest level in brain. 

CC -!- PTM: May be proteolytically processed at the boundary between the 

CC N-terminal nonconserved and the central conserved domain in 

CC neuron-like cells (By similarity) . 

CC -!- SIMILARITY: Belongs to the neurexophilin family. 

CC 



CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinformatics and the EMBL outstation - 

CC the European Bioinformatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www. isb-sib.cn/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; BC022541; AAH22541.1; 

DR EMBL; AB032985; BAA86473.1, 

DR EMBL; AF043468; AAD02281.1, 

DR Genew; HGNC:8077; NXPH3 . 

DR MIM; 604636; -. 

DR GO; GO: 0005576; C : extracellular ; NAS. 

DR GO; GO: 0007218; P : neuropeptide signaling pathway; NAS. 

KW Signal; Glycoprotein. 
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IV (LINKER DOMAIN) . 
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. ) (POTENTIAL) 
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127 
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CONFLICT 
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E -> G (IN REF. 1) . 




SQ 


SEQUENCE 


252 AA; 


28127 


MW 


; 74D2B3D5A89D221F CRC64; 


Query Match 




100. 


0%; 


Score 1386; DB 1; 


Length 252; 


Best Local Similarity 


100. 


0%; 


Pred. No. 3.6e-107; 





Matches 252; Conservative 



0; Mismatches 



0; Indels 



0 ; Gaps 



0; 



Qy 


l 


Db 


l 


Qy 


61 


Db 


61 


Qy 


121 


Db 


121 


Qy 


181 


Db 


181 


Qy 


241 


Db 


241 



MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 6 0 

I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 60 

ANSTLLGLLAPPGEAWG I LGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTG 120 

1 1 III MM M 1 1 1 1 II 1 1 1 1 1 1 III 1 1 III II 1 1 1 1 1 1 1 1 1 1 II 1 1 III I Ml I Ml 1 1 

ANSTLLGLLAPPGEAWG I LGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTG 12 0 

KI VDHGNGTFSVHFQHNATGQGNI SI SLVPPSKAVEFHQEQQI F I EAKASKI FNCRMEWE 18 0 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 i 1 1 1 1 1 1 i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 f 1 1 1 1 1 1 1 E 1 1 1 1 

KI VDHGNGTFSVHFQHNATGQGNI S I SLVPPSKAVEFHQEQQI F I EAKASKI FNCRMEWE 18 0 

KVERGRRTSLCTHDPAKI CSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 

I II I II 1 1 1 II II 1 1 1 II 1 1 1 1 II 1 1 II 1 1 1 1 1 1 M I II 1 1 1 1 1 II 1 1 1 1 1 1 II II 1 1 1 

KVERGRRTSLCTHDPAKI CSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 



I I I I I I I I I I I 



RESULT 2 
NXP3JVIOUSE 

ID NXP3_MOUSE STANDARD; PRT; 252 AA. 



AC Q91VX5; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT. 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE Neurexophilin 3 precursor. 

GN NXPH3 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast ; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L. , Shenmen CM. , Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T. , Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L. , Marusina K. , Farmer A. A. , Rubin G.M., Hong L. , 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C. , 

RA Raha S.S., Loquellano N.A. , Peters G.J. , Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J. , McKernan K.J., Malek J. A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A. , 

RA Fahey J., Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A., 

RA Whiting M., Madan A., Young A.C., Shevchenko Y. , Bouffard G.G., 

RA Blakesley R.W., Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M. , 

RA Butterfield Y.S.N. , Krzywinski M.I., Skalska U. , Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A.; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences . " ; 

RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [2] 

RP TISSUE SPECIFICITY. 

RX MEDLINE=98237742; PubMed=9570794 ; 

RA Missler M. , Suedhof T.C.; 

RT "Neurexophilins form a conserved family of neuropeptide -like 

RT glycoproteins . 11 ; 

RL J. Neurosci. 18:3630-3638(1998). 

CC -!- FUNCTION: May be signaling molecules that resemble neuropeptides. 
CC Ligand for alpha -neurexins . 

CC -!- SUBCELLULAR LOCATION: Secreted (Potential). 

CC -!- TISSUE SPECIFICITY: Highest level in brain, present also in lung, 
CC kidney and testis. 

CC -!- PTM: May be proteolyt ically processed at the boundary between the 

CC N-terminal nonconserved and the central conserved domain in 

CC neuron-like cells (By similarity) . 

CC -!- SIMILARITY: Belongs to the neurexophilin family. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 



CC or send an email to license@isb-sib. ch) . 

cc 

DR EMBL; BC007167; AAH07167.1; 

DR MGD; MGI: 1336188; Nxph3 . 

DR GO; GO:0005102; F : receptor binding ; IDA. 

KW Signal; Glycoprotein. 
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SQ 


SEQUENCE 


252 AA; 


28183 


MW; A770645706435A7C 


CRC64 ; 



Query Match 96.8%; Score 1341; DB 1; Length 252; 

Best Local Similarity 96.4%; Pred. No. 1.8e-103; 

Matches 243; Conservative 3; Mismatches 6; Indels 0; Gaps 0; 
Qy 1 MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 60 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II II 1 1 1 1 1 1 M 1 1 1 1 1 III II lh 

Db 1 MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPEHDDHEGQPRPRVPRKRGHISPKSRPL 60 

Qy 61 ANSTLLGLLAPPGEAWGILGQPPNRPNHSPPPSAKVKKI FGWGDFYSNI KTVALNLLVTG 12 0 

I I I I I I t I I I t I I I Ihllllllll II II llllllllllllllllllllllllll 
Db 61 ANSTLLGLLAPPGEVWGVLGQPPNRPKQSPLPSTKVKKI FGWGDFYSNI KTVALNLLVTG 12 0 

Qy 121 KI VDHGNGTFSVHFQHNATGQGNI S I SLVPPSKAVEFHQEQQI F I EAKASKI FNCRMEWE 180 

1 1 1 1 1 1 1 1 1 1 1 1 1 M 1 1 1 1 1 1 III 1 1 1 1 1 1 1 1 1 1 1 1 M II II I II I II 1 1 1 1 1 1 1 1 1 1 1 

Db 121 KIVDHGNGTFSVHFRHNATGQGNI SI SLVPPSKAVEFHQEQQI FI EAKASKI FNCRMEWE 180 

Qy 181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 

II 1 1 1 1 1 1 1 1 1 1 1 1 M M 1 1 1 1 1 1 M 1 1 1 1 II I M II 1 1 II 1 1 1 M 1 1 1 1 1 1 II I II I 

Db 181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 24 0 



Qy 241 NYHSDTPYYPSG 252 

Illlllllllll 
Db 241 NYHSDTPYYPSG 252 



RESULT 3 
NXP3_RAT 

ID NXP3_RAT STANDARD; PRT; 252 AA. 

AC Q9Z2N5; 

DT 16-OCT-2001 (Rel . 40, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE Neurexophilin 3 precursor. 

GN NXPH3 OR NPH3 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodent ia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 
RN [1] 



SEQUENCE FROM N.A. 
TISSUE=Brain; 

MEDLINE=98237742 ; PubMed=9570794 ; 
Missler M . , Suedhof T.C.; 

"Neurexophilins form a conserved family of neuropeptide -like 
glycoproteins . " ; 

J. Mol. Neurosci. 18:3630-3638(1998). 

-!- FUNCTION: May be signaling molecules that resemble neuropeptides. 
Ligand for alpha -neurexins (By similarity) . 
SUBCELLULAR LOCATION: Secreted (Potential) . 

TISSUE SPECIFICITY: Brain. Detected in several other tissues. 
PTM: May be proteolytically processed at the boundary between the 
N-terminal nonconserved and the central conserved domain in 
neuron-like cells (By similarity) . 
-!- SIMILARITY: Belongs to the neurexophilin family. 

This SWISS-PROT entry is copyright. It is produced through a collaboration 
between the Swiss Institute of Bioinformatics and the EMBL outstation - 
the European Bioinformatics Institute. There are no restrictions on its 
use by non-profit institutions as long as its content is in no way 
modified and this statement is not removed. Usage by and for commercial 
entities requires a license agreement (See http://www.isb-sib.ch/announce/ 
or send an email to license@isb-sib.ch) . 
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SQ 


SEQUENCE 


252 AA; 


28225 MW; 


5FF10603CE66D8BA 


Query Match 




96.5%; 


Score 1337; DB 1; 



.) (POTENTIAL) 

.) (POTENTIAL) 

.) (POTENTIAL) 

.) (POTENTIAL) 



Best Local Similarity 96.0%; 
Matches 242; Conservative 



Pred. No. 3.8e-103; 
4; Mismatches 6; 



Length 252; 
Indels 



0 ; Gaps 



0; 



Qy 

Db 



1 MQLTRCCFVFLVQGSLYLVI CGQDDGPPGSEDPERDDHEGQPRPRVPRKRGHI SPKSRPM 60 

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II hi 1 1 1 1 II 1 1 1 1 1 1 1 1 Ml 1 1 M 1 1 1 1 1 1 1 1 1 1 h 

1 MQLTRCCFVFLVQGSLYLVI CGQEDGPPGSEDPEHDDHEGQPRPRVPRKRGHISPKSRPL 60 



Qy 
Db 

Qy 
Db 

Qy 
Db 



61 ANSTLLGLLAPPGEAWGI LGQP PNRPNHS P PPSAKVKKI FGWGDF YSN I KTVALNLL VTG 120 

II I Ml II 1 1 1 1 II I Ml M 1 1 1 1 1 1 1 1 II III M II 1 1 1 II! 1 1 1 II 

61 ANSTLLGLLAPPGEVWG I LGQP PNRPKQSPLPSTKVKKIFGWGDFYSN I KTVALNLL VTG 120 
121 KI VDHGNGTFS VHFQHNATGQGN I S I SLVP PSKAVEFHQEQQ I F I EAKASKI FNCRMEWE 180 

1 1 III I II 1 1 1 II I M M 1 1 II II llllll I II 1 1 1 1 1 1 1 1 1 1 1 Ml II II 1 1 Mill II 

121 KI VDHGNGTFSVHFRHNATGQGNIS I SLVPPSKAVEFHQEQQIFI EAKASKI FNCRMEWE 180 
181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKWCVYIAFYSTDYRLVQKVCPDY 240 

M 1 1 II M 1 1 1 III I II M 1 1 1 1 M 1 1 1 1 II 1 1 1 II Ml 1 1 II II I II Ml M 1 1 1 MM 

181 KVERGRRTSLCTHDPAKICSRDHAQSSATWSCSQPFKIVCVYIAFYSTDYRLVQKVCPDY 24 0 



